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It is an honor to comment on Prof. Efron's lat- 
est contribution to the merging of frequentist and 
Bayesian thinking into a harmonious (even if not 
strictly coherent) statistical viewpoint. I will review 
my thinking along those lines and some inspirations 
for it. I agree with most of Dr. Efron's views ex- 
pressed here and in Efron (2005), with these im- 
portant exceptions: First, I disagree that frequen- 
tism has supplied a good set of working rules. In- 
stead, I argue that frequentism has been a prime 
source of reckless overconfidence in many fields (es- 
pecially but not only in the form of 0.05-level test- 
ing; see Rothman, Greenland and Lash, 2008, Chap- 
ter 10 for examples and further citations). I also dis- 
agree that Bayesians are more aggressive than fre- 
quentists in modeling. The most aggressive model- 
ing is that which fixes unknown parameters at some 
known constant like zero (whence they disappear 
from the model and are forgotten), thus generating 
overconfident inferences and an illusion of simplic- 
ity; such practice is a hallmark of conventional fre- 
quentist applications in observational studies. 

As working rules, the problem with conventional 
methods lies not so much with frequentism, but 
rather with frequentist tools for designed experi- 
ments being misapplied to observational data 
(Greenland, 2005a). Bayesians can and do misap- 
ply their methods similarly; they just haven't been 
given as much opportunity to do so. Conversely, 
many frequentist as well as Bayesian tools for obser- 
vational studies have been developed, especially for 
sensitivity analysis. But the overconfidence problem 
has been perpetuated by the ongoing concealment 
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of unbelievable point-mass priors within models in 
order to maintain frequentist identification of target 
parameters. 

The problem can be addressed by sacrificing iden- 
tification and replacing bad modeling assumptions 
with explicit and reasonable priors (Gustafson, 2005; 
Greenland, 2005a, 2009a). Perhaps ironically, fre- 
quentist thought experiments and simulations can 
then provide both contextual and frequentist diag- 
nostics (Rubin, 1984; Greenland, 2006; Gustafson 
and Greenland, 2009). Thus, frequentist thinking 
can address Bayesian overconfidence just as Bayesian 
thinking can address frequentist overconfidence. 
Hence I would strengthen Box's plea for ecumenism 
(Box, 1983) into an imperative to fuse Bayesian and 
frequentist concepts and methods in statistical 
inference — and in teaching as well. This theme is far 
from new (e.g., besides Box, see Good, 1983; Diaco- 
nis and Freedman and discussants, 1986; Samaniego 
and Reneau, 1994), yet it has barely touched every- 
day teaching and practice. In this case (unlike many) 
that is not because of software limitations; in fact, 
for the bulk of applications the same software can be 
used for both frequentist and Bayesian calculations 
(Greenland, 2007, 2009a). 

HIERARCHICAL MODELING: WHERE PRIORS 
AND FREQUENCIES MEET 

Bayesian and frequentist ideas intertwine in hi- 
erarchical modeling (Efron's Section 9), which en- 
compasses both Bayesian and empirical-Bayes ap- 
proaches (Good, 1983, 1987) as well as other shrink- 
age techniques. Efron and Morris (1973, 1975) were 
among the earliest to demonstrate convincingly that 
hierarchical models offered practical as well as the- 
oretical advantages for data analysis. Their writings 
(along with those of Jack Good, George Box and Ed- 
ward Leamer) inspired my applications of hierarchi- 
cal modeling and Bayesian methods in epidemiology, 
where the hierarchy levels are naturally determined 
by physical structures and observation processes. 

As an example, in nearly all observational stud- 
ies of nutrient effects, individual risks are regressed 
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directly on nutrient intakes calculated from food in- 
takes. This conventional model makes no further 
use of the food intakes, and so assumes implicitly 
that foods have no effect on risk beyond their calcu- 
lated nutrient content. This is an unsupported and 
very doubtful assumption. A more realistic model al- 
lows food effects beyond measured-nutrient content. 
However, the resulting two-level hierarchical model 
is not identified without a prior because nutrient 
intakes are linear functions of food intakes (mak- 
ing nutrient and food intakes completely collinear). 
Using any contextually defensible prior reveals that 
the conventional analysis generates overconfident in- 
ferences, both in the Bayesian sense of overstating 
information (Greenland, 2000), and also in the fre- 
quentist sense of producing interval undercoverage 
(Gustafson and Greenland, 2006). That overconfi- 
dence may explain the rather embarrassing track 
record of nutritional epidemiology when compared 
against clinical trials (Lawlor et al., 2004). Ecologic 
analyses provide other examples in which use of the 
natural hierarchical structure with explicit priors is 
needed to avoid overconfidence (Wakefield, 2009). 

In this work, I have come to appreciate that a 
simultaneously Bayesian and frequentist viewpoint 
is essential for a credible analysis of observational 
data. I must be at least informally Bayesian, know- 
ing that there is no contextual credibility without 
consideration and use of prior information, espe- 
cially in model specification. But I should also be 
at least informally frequentist, knowing that priors 
should be weighted lightly unless they derive from 
statistical observations such as frequencies in par- 
tially exchangeable past experience (e.g., surveys) 
or classical measurement processes (e.g., laboratory 
determinations). Most of all, I should not rigidly ad- 
here to ideologies or models, especially when a clash 
between my prior and my likelihood function shows 
that my understanding of the situation is more de- 
ficient than I initially thought (Box, 1980, 1990). 

PRIORS: EVERYBODY USES THEM (BUT 
MOST CALL THEM "MODELS") 

As Efron illustrates in Section 4, all analyses la- 
beled as frequentist are built on priors, although 
these priors are called "models," which avoids the 
controversies associated with overtly Bayesian anal- 
ysis (Leamer, 1978; Box, 1980, 1983). Even the sim- 
plest randomized-trial analysis is based on a model, 
namely the prior belief that treatment was random- 
ized fairly, and the reported subjects actually exist. 



As numerous cases of fraud demonstrate, that be- 
lief may be mistaken more often than those receiv- 
ing medical treatment would like to think (e.g., see 
Greenland, 2009b). 

Labeling assumptions and models as prior beliefs 
might better alert us to the act of faith involved in 
their use. As Box (1980) said 

I believe that it is impossible logically to distin- 
guish between model assumptions and the prior 
distribution of the parameters. The model is 
the prior in the wide sense that it is a probabil- 
ity statement of all the assumptions currently 
to be tentatively entertained a priori. On this 
view, traditional sampling theory was of course 
not free from assumptions of prior knowledge. 
Instead it was as if only two states of mind had 
been allowed: complete certainty or complete 
uncertainty. 

I have grown increasingly uncomfortable with the 
convention of failing to label models as priors. It en- 
courages the use of arbitrary constraints, and ques- 
tions constraints only if the analysis data (the direct 
evidence) can reveal departures — even though stud- 
ies are not designed with anywhere near sufficient 
power to reveal all important model violations. The 
representation of modeling constraints in belief net- 
works (Madigan, Mosurski and Almond, 1997) can 
aid in the display of these constraints as imposed 
beliefs and thus expose implausible aspects of the 
model, although of course it cannot address data 
limitations. Yet single datasets are often too limited 
to tell us much about either the effects under study 
or our models (Robins and Greenland, 1986) — at 
least if we do not impose a hoard of dubious in- 
dependence constraints that amount to point-mass 
priors with no supporting data. 

Additivity in generalized linear models is an ex- 
ample: with n covariates, additivity sets all orders 
of product terms ("interactions") among them to 
zero, and is equivalent to using a point mass at 
zero for the joint prior on these terms. Entering the 
few "significant" two-way products hardly makes a 
dent in this set of constraints if n > 5; yet n > 8 
is common and n > 20 not unusual. Arbitrary ad- 
ditive constraints can be relatively harmless when 
estimating a population-average effect, because the 
specification error they entail may average out in 
much the way random residual error does (Green- 
land and Maldonado, 1994). But the constraints can 
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be deadly when used for individual (clinical) risk 
prediction, as adverse drug interactions demonstrate. 

Hierarchical methods offer one way to relax ad- 
ditivity constraints in a controlled fashion, by in- 
cluding all or many products but shrinking their 
estimates toward zero or a second-level structure 
(Wakefield, De Vocht and Hung, 2010). More gen- 
erally, we can expand an unrealistic conventional 
model by embedding it in a richer, more realistic 
hierarchical model, then shrink estimates from the 
latter using prior distributions. Aspects of these dis- 
tributions may be chosen to improve frequency per- 
formance in high-dimensional problems, but such 
methods do not preclude the use of prior informa- 
tion to judge those and other aspects of the formal 
prior distribution. 

THE NEED FOR EXPLICIT PRIORS IN 
OBSERVATIONAL STUDIES 

My discomfort with conventional treatments of 
modeling has increased knowing that observational 
data analysis can identify causal effects only by us- 
ing indirect evidence, no matter how large the dataset 
or how informed by past observational data. This 
is the usual situation in epidemiology, where con- 
founders, selection-probability ratios, or valid ex- 
posure measurements are unavailable for analysis 
(Greenland, 2005a; Gustafson, 2005; Rothman, 
Greenland and Lash, 2008, Chapter 19; Lash, Fox 
and Fink, 2009). The problem is a variant of the non- 
identifiability of a regression coefficient when some 
regressors are latent (Leamer, 1974). In these cases a 
credible formal analysis must introduce proper pri- 
ors in place of overconfident identifying constraints. 

Use of identified regression models as sources of 
effect estimates corresponds to a multidimensional 
point prior that says there is no uncontrolled con- 
founding or selection bias, and that measurements 
(including validation measurements) were accurate 
or at least reliable for life histories. Taken jointly, 
these assumptions are absurd in topics like nutri- 
tional and "lifestyle" epidemiology. But relaxing 
these silly and harmful assumptions leads to a realm 
where most Bayesians as well as frequentists fear to 
tread: Specification of prior distributions that can- 
not be effectively checked or updated with the anal- 
ysis data. 

When the scientific validity of each analysis hinges 
on extensive and untestable prior specifications, an 
analysis can be no more than a rough guess about a 



vast unknown, and represents but one element in a 
sensitivity analysis (Greenland, 2005b). This is true 
even of a formal sensitivity analysis, which is lim- 
ited to examining a few parameters lest it become 
unintelligible. In this reality, the importance of spe- 
cific models and priors should be de-emphasized in 
favor of providing a framework for sensitivity anal- 
ysis across plausible models and priors. Accuracy of 
computation becomes secondary to prior specifica- 
tion, which is too often neglected under the rubric 
of "objective Bayes" (a.k.a. "please don't bother me 
with the science" Bayes). 

There is simply no point in trying to do well at all 
conceivable parameter values given the model when 
the model embedding the parameter has already 
imposed doubtful point constraints. Hence I have 
sought approaches in which informative priors are 
central. Good (1983) provided the key ingredients: 
Priors can be transformed into penalty functions, 
which can then be transformed into "prior data" 
that generate the penalties as log-likelihood con- 
tributions. This transformation allows evaluation of 
prior-knowledge claims in a currency familiar to the 
subject-matter expert, as well as use of familiar and 
rapid fitting methods for basic models (Bedrick, 
Christensen and Johnson, 1996, 1997; Greenland, 
2006, 2007, 2009a, 2009c). 

Note that conversion of priors to prior data does 
not require conjugacy; it only requires that the penal- 
ties have representations as transformed likelihoods 
from a series of observations or experiments. The 
credibility of the prior may be questioned if such a 
representation is absent, arcane, or absurd. Evalua- 
tion of priors in terms of equivalent data is partic- 
ularly illuminating in human-subject fields, where 
data are expensive and hence sparse. Here, strong 
priors may be seen as claiming access to a volume of 
data that does not exist, thus casting doubt on prior 
assertions of some experts (Higgins and Spiegelhal- 
ter, 2002; Greenland, 2006). 

When priors (the indirect evidence) are recali- 
brated to match the frequentist outputs of reason- 
ably sized thought experiments, the combined evi- 
dence will often be too limited to distinguish among 
the effect sizes at issue (Greenland, 2009c). This is 
unwelcome news to some colleagues, albeit no news 
to others. Regardless, the future of indirect evidence 
should be recognition for what it is: Omnipresent 
and essential for any inference beyond "more re- 
search is needed" (which may the strongest conclu- 
sion we can hope to wrest from most studies, albeit 
not always justifiable in economic terms). 
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Thus I would conclude by echoing Efron: Whether 
Bayesian, frequentist, ecumenic, or syncretic, statis- 
ticians need to become better at creating and evalu- 
ating contextually informed models — which include 
both well-informed prior distributions and sensible 
qualitative structures. It follows that statistical train- 
ing should introduce informative-Bayesian methods 
in tandem with classical (and often destructive) fre- 
quentist methods, rather than as an afterthought 
or specialty topic. Data priors provide one easy and 
natural way to do so, displaying as they do the sym- 
metry between indirect and direct evidence, and ex- 
posing priors to a new angle of criticism. 
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